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ABSTRACT 

Pinboard on Pinterest is an emerging media to engage on¬ 
line social media users, on which users post online images for 
specific topics. Regardless of its significance, there is little 
previous work specifically to facilitate information discov¬ 
ery based on pinboards. This paper proposes a novel pin¬ 
board recommendation system for Twitter users. In order 
to associate contents from the two social media platforms, 
we propose to use MultiLabel classification to map Twitter 
user followees to pinboard topics and visual diversification 
to recommend pinboards given user interested topics. A pre¬ 
liminary experiment on a dataset with 2000 users validated 
our proposed system. 

Categories and Subject Descriptors 

H. 3.5 [Online Information Services]: We-based services 

General Terms 

Experimentation 
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I. INTRODUCTION 

Socially Curation Service [3], on which people can share 
and reshare online content that they found interesting, is be¬ 
coming more and more popular. For example, Pinterest has 
successfully attracted more than 40 million monthly active 
users ^ to create pinboards and share online images. Social 
curation creates high quality contents for users to discover, 
search and follow what they are interested in. For example, 
Pinterest has already delivered better search results than 
Google for a number of segments, such as food recipes, fash¬ 
ion and DIY Pinboard, on which people can put together 

^eMarketer, Feb 2015 
^http://goo.gl/PCWG3W 
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Figure 1: An illustrative example to recommend pinboards 
to a Twitter user “@xyz” 


online images for specific topics to share with others, is the 
key element for Pinterest to engage users. Different from 
traditional photo albums, such as those on Facebook and 
Flickr, pinboards are not yet another way for people to or¬ 
ganize images, but also encouraging users to carefully select 
images using their creativity. Also, in order to continuously 
attract followers, pinboard curators take efforts to keep up 
most recent contents, so pinboards give users a new way to 
follow most fresh updates on the topic they are interested 
in. As with other online systems, efficient information dis¬ 
covery is one of the major hurdles for continuous growth. In 
this paper, we propose a pinboard recommendation system 
to help users to discover interesting pinboards on Pinterest. 

As a relatively new social media platform, Pinterest is 
still in its young age. Therefore, how to keep new users is 
an important topic, but recommendation for new users suf¬ 
fers cold start problem for the lack of activity history. We 
propose to leverage 3rd party social media platforms, such 
as Twitter, to solve the cold start problem for new Pinter¬ 
est users. There are a number of reasons to use Twitter as 
an intermediate platform to recommend pinboards. First, 
Twitter is more mature and attracts even more active users, 
so it represents larger coverage for online users. Second, be¬ 
cause most contents on Twitter are text based, it is much 
easier to find out interesting topics on Twitter than on Pin¬ 
terest. Third, even though text based platform makes it 
easier to find interesting topics, image based pinboards, if 
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Figure 2: Overview of the proposed recommendation system. First, we align and filter the data from two different sources. 
After that, we collect multimedia contents, such as Pinboards and Pins information, and tweets from these active users. 
In the meanwhile, we build the pinboard topic ontology automatically, which is done by pruning an expert ontology such 
as Wikipedia. Second, we perform user modeling in two folds. For Pinterest, we obtain multi-label representation for each 
account by mapping their boards to the topics in Pinterest ontology. For Twitter, we extract both text feature and personality 
feature from user timeline. Third, we associate these two user models using multi-label classification, in which Twitter feature 
is used to predict the Pinterest category representation of each user. In the end, we recommend diverse Pinterest boards 
according to this prediction, where diversity is guaranteed by combining visual contents of the images in each board. 


well aligned with the interests, provide better content for 
users to digest. For these reasons, recommending pinboards 
based on Twitter activity history is a perfect match. 

We propose following procedure to recommend pinboards 
to Twitter users. (1) Given Twitter users, we extract their 
followees. We assume that the follower relationship rep¬ 
resents user interests. (2) Using the proposed association 
method, we map the followees to pinboards topics, which 
represent what kind of boards the Twitter users will be in¬ 
terested in. The noise of individual followee can be reduced 
by aggregating all mapping results. (3) We choose a number 
of pinboards from the selected topics, and recommend these 
pinboards to the Twitter users. 

Using the example in Fig. 1, we hnd that twitter user 
“@xyz” is following a twitter user “@lovel46”. By collect¬ 
ing and classifying timeline of “@lovel46”, we map the user 
“@lovel46” to one of the pinboard topics “anti human traf- 
hcking”, so we assume that the target twitter user “@xyz” 
is interested in the boards about “anti human trafficking”. 
Therefore, we select a number boards on this topic. We 
further improve the recommendation quality by a reranking 
method based on visual diversity analysis. 

Since the followees can be extracted trivially from Twitter 
API, we solve the following two key problems in this paper, 
as illustrated in Fig. 2 

Map user timeline to pinboard topics is an example of 
cross network analysis [10], and we build the association by 
mining users who are active on both Pinterest and Twitter. 
We crawl the timeline and pinboards associated with users 
that are active on both platforms. The pinboards are then 
mapped onto a predehned topic ontology using text based 
matching. After that, the users are associated with a twit¬ 
ter timeline and a distribution of pinboard topics. Then we 
learn a multi-label classifier to map user timeline to the pin¬ 
board topics. Using this multi-label classffier, we can map 


twitter followees without Pinterest account to pinboard top¬ 
ics. 

Pinboard reranking is the process to recommend the best 
subset from the list of interested pinboards. We adapt a 
clustering based algorithm to perform this reranking, and 
the goal is to make sure that the recommended pinboards 
cover as much aspects as possible for the selected topics. 

We make following contributions in this paper, (1) We 
present and release a linked dataset that associate pinboards 
and Twitter users. This will be of interests to broader re¬ 
search than cross network recommendation. (2) We propose 
a cross network recommendation method based on multi¬ 
label classffication. (3) We propose a visual reranking method 
to diversify pinboard recommendation. 

2. RELATED WORK 

Our work is related to cross network analysis, user mod¬ 
eling and hierarchical multi-label classification. In this sec¬ 
tion, we highlight the novelty of our work by comparing with 
recent works on these related he Ids. 

Cross network analysis aims to merge social signals from 
different network to increase online social media platform 
engagement. For example, in [10], Yan et al. proposed 
to identify the best Twitter accounts to promote YouTube 
videos, by mining the associations between topics learning 
from user tweets and their favorite YouTube videos. 

User modeling is the foundation of personalized services, 
such as personalized recommendation, search engine rerank¬ 
ing and advertisements targeting. One component of our 
Pinterest board recommendation system is based on Pinter¬ 
est board ontology mapping, which is inspired from [2]. In 
this work, Geng et al proposed a multi-task GNN to map 
pinterest images to fashion ontology and the classihcation re¬ 
sults were sorted as user prohles for image recommendation. 
We adapt the idea that user interests can be represented 
by multiple nodes on ontology and further extend the on- 
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statistics 

total nodes 

471 

root nodes 

20 

leaf nodes 

440 

least/largest depth 

2/5 


Table 1: Pinterest topic ontology Statistics 

tology domain to 20 categories, including “Fashion”, “Food” 
and “Wedding”. In addition, instead of single label classifi¬ 
cation on images and aggregating the classification results to 
form user profile, we perform hierarchical multi-label clas¬ 
sification on aggregated user tweets to figure out the user 
interests directly. 

Hierarchical MultiLabel Classification is the problem 
of classifying data instances to multiple labels or attributes, 
in which the labels are structured in a hierarchical taxon¬ 
omy. [7] proposed a hierarchical multi-label system to clas¬ 
sify short texts (e.g., tweets). In this work, Ren et al pro¬ 
posed to use text expansion, e.g., entity linking, to deal with 
the shortness and concept drift problem in short text classi¬ 
fication. Our twitter user modeling is also based on hierar¬ 
chical multi-label classification. However, instead of single 
tweet classification, we deal with the entire timeline of users, 
so that each user is modeled by a large number of tweets. In 
this paper, we adapt Randomized Labelsets [8] to efficiently 
model the hierarchical dependency automatically. 

3. PINBOARD VISUAL DIVERSIFICATION 

The goal of this stage is to recommend most relevant and 
diverse boards, given predicted categories and millions of 
board candidates sampled from Pinterest board pool. To 
guarantee the relevance, we map each board to the cate¬ 
gories onto the topic ontology with the method described 
in Experiments Section 4, and also sort the boards of each 
category according to their popularity (e.g. followers). We 
then chose top k boards for each category as its preliminary 
candidates (we set A; = 10 in our experiment). 

To enhance diversity, we utilize the visual contents in Pin¬ 
terest, that is, for each category, we cluster all pin images 
that belong to its preliminary board candidates, and then 
chose the boards that can maximize the coverage of these 
clusters. Specifically, we use deep Convolutional Neural Net¬ 
work (CNN) to learn useful features for the pin images (fc7 
layer of the AlexNet [4]). After that. Affinity Propagation 
algorithm [1] is applied to cluster 4096D image features. Af¬ 
ter obtaining the clusters and assignment for each pin image, 
we can build a cluster distribution for each board. Then the 
board selection problem can be formalize as follows. 

Denote the candidate set by B, its subset hy B C B, a 
board in the subset hy b E B and the cluster assignment 
distribution of the board by Vb G . C is the number of 
clusters for all images in the pinboard candidates. Then, we 
define the set level entropy by, 

= ( 1 ) 

in which H{x) is the entropy of a distribution x. Then, we 
select the best set of pinboards by argmax^ ^(^)- We can 
prove that this is actually an NP-complete problem, so an 
approximation algorithm is needed, but in our experiments 
we just use brutal force to enumerate all possibilities by 
restricting the number of pinboard candidates \B\ and the 
recommendation set \B\. 


4. EXPERIMENTS 

4.1 Data Collection 

4.1.1 Pinterest ontology construction and refining 

Ontology is a natural way to model the hierarchical struc¬ 
ture of the categories in Pinterest, as is also used in [2] to 
organize items in fashion domain. To construct a full Pin¬ 
terest ontology, we first build a preliminary ontology based 
on Wikipedia Categories, where the root nodes of it are 20 
manually selected categories according to the 38 original cat¬ 
egories given by Pinterest, such as “Fashion”, “Food” and 
“Wedding”. After that, we prune the ontology to adapt it 
to Pinterest user interest distribution. Basically, categories 
with low term frequency in board and pin information are 
removed. As board information and pin information share 
different weights for the ontology, we consider their pruning 
approaches separately, and the pruned ontology is obtained 
by uniting the results of two approaches. In addition, popu¬ 
larity metrics like the number of followers, likes, comments 
and repins are also considered as positive weighted metric 
while pruning. As a result, we obtain the refined Pinterest 
ontology and the statistics are shown in Table 1. 

4.1.2 Map pinboards to topic ontology 

With the refined Pinterest ontology, we can model a user’s 
Pinterest profile by topic distribution. As users have their 
own boards, we can obtain this representation by mapping 
their boards to the categories in the ontology. For each 
board, this mapping approach takes three steps. First, we 
match board information to the categories, which can get a 
few or even no matches (but more accurate), as the board 
information is always rare. Second, we match all pins infor¬ 
mation under the board. Third, concatenate the result of 
previous two steps. After obtaining the categories mapping 
for each board, we can get the representation for the users 
through the union of the categories of their own boards. 

4.1.3 Twitter timeline features 

We extract two kinds of features to represent the tweets 
information for each user. First is the original text feature 
extraction, where feature hashing [9] and tf-idf term weight¬ 
ing are applied to reduce feature dimension and weight fea¬ 
ture appropriately. Second, we also compute 64 LIWC fea¬ 
tures (e.g., word categories such as “social” and “work”) to 
model the personality of each user [5]. These features may 
potentially reflect the distribution of their interest and user- 
curated data (e.g., Pinterest boards). 

4.1.4 Data Statistics and Thresholds 

In data preparation stage, we filter out those accounts 
that do not have twitter links or twitter activities are not 
active enough (number of tweets less than 200). Finally 
we get 2265 accounts from original 50000 sample Pinterest 
accounts. With these active accounts, we crawl 4.9 million 
tweets, 25.5 thousand boards and 2 million pins in total. 

In ontology pruning stage, we set the threshold of term 
frequency for pins information to be 200 and the threshold 
for board information to be 1/100 of it, as the number of 
boards is nearly 1/100 of that of pins. After that we get 471 
nodes in the ontology, as illustrated in Table 1. 

In user modeling stage, for Pinterest data, we build up a 
multi-label representation for each user. Table 2 shows cer- 



Examples Attributes Labels Label Cardinality Label Density 
2265 5000+64 471 16:02 0^4 


Table 2: Data Statistics 
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Figure 3: Experiment results varying key parameters in 
RAkEL algorithm [8]. 


tain standard multi-label statistics of these representation, 
such as the number of labels, the label cardinality and the 
label density [8]. Label cardinality (LC) is the average num¬ 
ber of labels per example, and label density is LC divided by 
the number of labels \L\. Eor Twitter data, we extract bag 
of words (BOW) text feature, and then apply feature hash¬ 
ing to get 5000 dimensional features. 64D LIWC feature is 
also extracted from the tweets via LIWC dictionary. 
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BR 

LP 

BR 

LP 

Hashing 

0.114 

0.143 

0.006 

0.021 

LIWC 

0.099 

0.086 

0.002 

0.002 

fusion 

0.115 

0.142 

0.006 

0.021 


Table 3: Experiment results, comparing different feature 
schemes. Note that the total number of labels is 471, so 
these results are significantly better than random guess. 
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4.2 Preliminary Evaluation 

We use macro-Fl^ a macro-averaged version of E-measure 
to measure our approaches, which is widely used in multi¬ 
label classification tasks [6]. Eormally, we represent an asso¬ 
ciation of labels v and its prediction i) as two binary vec¬ 
tors {0,1}^. Then E-measure (El) can be used to measure 
the accuracy of the prediction, which is the harmonic mean 
of precision and recall and defined as follows: Fl{v,v) = 
(2 X precision x recall)/{precision + recall), where preci¬ 
sion is defined as \v A f)|/|f’|, and recall is \v A f)|/|L’|. 

Eor a multi-label classification task with N examples and 
L labels, we have two different ways to average the El scores: 
(1) FI macro-ex. average El scores of N examples, which re¬ 
flects example based evaluation. (2) FI macro-label: average 
El scores of L labels, which reflects label based evaluation. 

We compare two common multi-label classification meth¬ 
ods: Binary RelevanceiBB) and Label Powerset{L'P)[Q]. BR 
considers the prediction of each label as an independent bi¬ 
nary classification task, while LP considers subset of labels 
as a single label to perform a multi-class classihcation task. 
Eor efficiency consideration, we adopt RAkEL [8] method, 
which is a variant of LP that ensemble different light LP 
classifiers trained by small random subsets of the set of la¬ 
bels. 

In order to evaluate the effect of personality features, i.e., 
LIWC feature, in this user-interest relevant task, we com¬ 
pare following three cases: Hashing feature only, LIWC fea¬ 
ture only and an early fusion of both features. 

4.3 Experiment results and Conclusion 

Eigure. 3 present the comparison between BR and RAkEL 
while tuning the parameters k and M in RAkEL method. 

Table 3 shows the full experiment results while comparing 
BR and LP using different kind of features. Through result 
above, we notice that LP based method can capture label 
dependencies and get better performance than BR method, 
especially when the subset size is large enough to contain 
sufficient labels. LIWC alone cannot get good result, which 


Eigure 4: The recommended boards for the topic “Advertise¬ 
ment”. Different kinds of advertisements are recommended, 
such as “commercial Ads”, “creative” and “vintage”. 

means that this personality feature cannot reflect one’s in¬ 
terest distribution across different platforms. 

We show diversihcation results qualitatively in Eigure. 4. 
The preliminary results qualitatively validate our proposed 
system, and further research will work on improving each 
components and perform user study to prove the efficiency 
and effectiveness of the system, and also on how to model 
the dynamic nature of the user interests. 
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